Bagging and Feature Selection for Classification with Incomplete Data

نویسندگان

Cao Truong Tran

Mengjie Zhang

Peter Andreae

Bing Xue

چکیده

Missing values are an unavoidable issue of many real-world datasets. Dealing with missing values is an essential requirement in classification problem, because inadequate treatment with missing values often leads to large classification errors. Some classifiers can directly work with incomplete data, but they often result in big classification errors and generate complex models. Feature selection and bagging have been successfully used to improve classification, but they are mainly applied to complete data. This paper proposes a combination of bagging and feature selection to improve classification with incomplete data. To achieve this purpose, a wrapper-based feature selection which can directly work with incomplete data is used to select suitable feature subsets for bagging. The experiments on eight incomplete datasets were designed to compare the proposed method with three other popular methods that are able to deal with incomplete data using C4.5/REPTree as classifiers and using Particle Swam Optimisation as a search technique in feature selection. Results show that the combination of bagging and feature selection can not only achieve better classification accuracy than the other methods but also generate less complex models compared to the bagging method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines

In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...

متن کامل

Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines

متن کامل

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

A New Framework for Distributed Multivariate Feature Selection

Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Bagging and Feature Selection for Classification with Incomplete Data

نویسندگان

چکیده

منابع مشابه

Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines

Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

A New Framework for Distributed Multivariate Feature Selection

عنوان ژورنال:

اشتراک گذاری